Storage pool capacity management

ABSTRACT

Embodiments relate to a pool of persistent storage volumes. Capacity of the volumes is managed to ensure continued operation and function of the volumes with respect to their corresponding storage pool capacity threshold(s). One or more space savings techniques are selectively performed on a copy of a selected volume. Such techniques include measurement of capacity change and measurement of workload performance change. These measurements are leveraged to produce a subset of space reduction actions for execution. A space reduction action in the form of compression or thinning takes place on-demand on a corresponding volume.

CROSS-REFERENCE TO RELATED APPLICATION

The present application is a continuation patent application claimingthe benefit of U.S. patent application Ser. No. 14/675,151, filed onMar. 31, 2015, and titled “Storage Pool Capacity Management” nowpending, the entire contents of which are hereby incorporated byreference.

BACKGROUND

The present invention relates to persistent storage capacity management.More specifically, the invention relates to one or more processes forspace savings impact analysis and evaluation and for execution of aspace saving technique.

Compression and thin provisioning are techniques used in data centers toreduce storage capacity usage, also known as a storage footprint,thereby making more storage available. Storage administrators canspecify whether a storage volume is compressed, thick, or thin, per amanagement policy not only at initial provisioning time but also duringsteady state lifecycle. However, reducing storage capacity using any ofthe above techniques may have a negative impact on applicationperformance. For example, reading from a compressed storage volumerequires the volume to be subject to a de-compression technique, whichrequires additional processing. At the same time, reading from a thinnedvolume may also require additional processing, such as metadata lookupprior to data access. Reading data from either a compressed volume or athinned volume introduces I/O latency.

There is a balance between performance of data storage techniques andapplication of data storage techniques. Performance impacts and capacitysavings are functions of a workload and vary widely across differentworkload types. For example, compressing or thinning a volume has aminimal benefit if there is ample available space in the storage pool inwhich the application volume resides. As such, storage footprintreducing techniques, such as compression and thin provisioning, aredesirable for application to free up storage capacity in storage poolsthat are near or have surpassed a capacity threshold.

SUMMARY

The invention includes a method, computer program product, and systemfor management of one or more storage pools containing one or morevolumes with respect to availability of storage space.

In one aspect, a method is provided to manage capacity of storagevolumes in a storage pool. Live statistics associated with storage poolvolumes operating in a first state are maintained. A list of candidatevolumes in a storage pool for space reduction is maintained.Characteristics of the list are directed at capacity and performance.The volumes in the list are subject to prioritization as a function ofstorage growth projection. At such time as a volume from the storagepool is selected, a space saving reduction action on a copy of theselected volumes is performed. This action includes measuring volumecapacity change, measuring workload performance change, and producing asubset of optimal space reduction actions for execution. At least oneaction from the subset is executed on-demand, thereby converting thefirst state to a second state, wherein the second state is eithercompressed or thinned.

In another aspect, a computer program product is provided to managecapacity of a storage pool. The computer program product includes acomputer readable storage device having embodied program code executableby a processing unit. The program code processes non-compressed data andmaintains live statistics associated with storage pool volumes operatingin a first state. A list of candidate volumes in a storage pool forspace reduction is maintained. Characteristics of the list are directedat capacity and performance. Program code is provided to subject thevolumes in the list to prioritization as a function of storage growthprojection. At such time as a volume from the storage pool is selected,the program code performs a space saving reduction action on a copy ofthe selected volumes. This action includes measuring volume capacitychange, measuring workload performance change, and producing a subset ofoptimal space reduction actions for execution. Program code performson-demand execution of at least one action from the subset, therebyconverting the first state to a second state, wherein the second stateis either compressed or thinned.

In yet another aspect, a computer system is provided to manage capacityof a storage pool. The system includes a processing unit operativelycoupled to memory, and a storage pool with two or more storage volumes,operatively coupled to the processing unit. A management tool isprovided in communication with the processing unit to manage capacity ofthe storage pool. The tool maintains statistics associated with storagepool volumes operating in a first state, the statistic includingcapacity and performance. The volumes in the list are subject toprioritization as a function of storage growth projection. At such timeas a volume from the storage pool is selected, a space saving reductionaction on a copy of the selected volumes is performed. This actionincludes measuring volume capacity change, measuring workloadperformance change, and producing a subset of optimal space reductionactions for execution. At least one action from the subset is executedon-demand, thereby converting the first state to a second state, whereinthe second state is either compressed or thinned.

These and other features and advantages will become apparent from thefollowing detailed description of the presently preferred embodiment(s),taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The drawings referenced herein form a part of the specification.Features shown in the drawings are meant as illustrative of only someembodiments of the invention, and not of all embodiments of theinvention unless otherwise explicitly indicated.

FIG. 1 depicts a flow chart illustrating an overview of the decouplingprocess.

FIG. 2 depicts a flow chart illustrating a process for estimating spacesavings.

FIG. 3 depicts a flow chart illustrating a process for predicting astorage threshold violation.

FIG. 4 depicts a flow chart illustrating managing storage capacity.

FIG. 5 depicts a block diagram illustrating components of a storage poolcapacity management system.

FIG. 6 depicts an example of a cloud computing node.

FIG. 7 depicts a cloud computing environment.

FIG. 8 depicts a set of functional abstraction layers provided by thecloud computing environment.

DETAILED DESCRIPTION

It will be readily understood that the components of the presentinvention, as generally described and illustrated in the Figures herein,may be arranged and designed in a wide variety of differentconfigurations. Thus, the following detailed description of theembodiments of the apparatus, system, and method of the presentinvention, as presented in the Figures, is not intended to limit thescope of the invention, as claimed, but is merely representative ofselected embodiments of the invention.

Reference throughout this specification to “a select embodiment,” “oneembodiment,” or “an embodiment” means that a particular feature,structure, or characteristic described in connection with the embodimentis included in at least one embodiment of the present invention. Thus,appearances of the phrases “a select embodiment,” “in one embodiment,”or “in an embodiment” in various places throughout this specificationare not necessarily referring to the same embodiment.

The illustrated embodiments of the invention will be best understood byreference to the drawings, wherein like parts are designated by likenumerals throughout. The following description is intended only by wayof example, and simply illustrates certain selected embodiments ofdevices, systems, and processes that are consistent with the inventionas claimed herein.

There are two primary components to the management of storage volumes,including identification and reclamation. Identification pertains toestimating space savings and performance impact associated with astorage footprint reduction technique. Reclamation pertains to delayingthe action of the storage footprint reduction technique until certaincriteria are met. Accordingly, these two aspects, includingidentification and reclamation, are decoupled until such time as thestorage savings has been deemed beneficial or necessary.

As discussed above, a data center is configured with two or more storagepools. One or more applications may execute applications with associateddata supported by volumes that reside in one or more of these pools. Tocreate storage space in the associated storage pools, compression and/orthin provisioning, also referred to herein as thinning, techniques maybe applied on one or more volumes residing in these pools. Applicationof compressing or thinning on all possible volumes in these pools can beimpractical in large scale systems both with respect to a timerequirement and any resulting performance overhead.

With reference to FIG. 1, a flow chart (100) is provided illustrating anoverview of the decoupling process. A similarity based sampling approachis employed to mitigate time and performance overhead associated withapplication of thinning and compression on all possible volumes. Assuch, the storage volumes in the data center are clustered based onsimilarity (102). In one embodiment, volumes may be deemed similar ifthey map to the same application, since it increases the likelihood thatthey store similar types of data. In one embodiment, volumes may bedeemed similar if they exhibit similar random or sequential read andwrite I/O proportions or properties. For example, two volumes withpredominantly random write I/O may be seen as similar. In oneembodiment, the Pearson correlation coefficient between the I/Oproportions of two or more volumes may be employed as a similarityvalue. Accordingly, using correlation values, volumes can be grouped ina desired number of clusters.

By clustering the volumes, as described above, a subset of volumes ineach cluster may be selected for evaluation of compression and thinning.Space savings and performance impact values obtained from the selectedvolumes in each cluster may be used to estimate similar characteristicsfor the remaining volumes in the cluster, and in one embodiment mayfunction as guidance for future sampling. The variable Y_(Total) isassigning to the quantity of clusters of volumes formed (104), and anassociated cluster counting variable, Y, is initialized (106). For eachcluster, a minimum of one volume is employed for evaluation. Thevariable X_(Total) is assigned to the quantity within a subset ofvolumes selected for evaluation in cluster_(Y) (108), and an associatedvolume counting variable, X, is initialized (110). Compression orthinning is applied to a copy of volume_(X) in cluster_(Y) (112) and anassociated workload is assigned to the compressed or thinned copie(s)(114). To understand the implications of processing with data that hasbeen compressed or thinned, both space saving data from the compressionor thinning is obtained (116) and performance impact data associatedwith the switch from a non-space reducing state to a space reducingstate, such as compressed or thinned copy, is obtained (118). Dataobtained at steps (116) and (118) are stored in an associated knowledgebase (120). In one embodiment, the knowledge base may be local to thedata center in a volume that is not subject to thinning or compression,or the knowledge base may be external to the data center. In oneembodiment, capacity data associated with the space saving data isstored at a first location and the performance impact data is stored ata second location. The first and second location may be the samelocations or different locations. Accordingly, both space saving dataand performance data associated with the compressed or thinned volume isacquired.

Following step (120), the volume counting variable is incremented (122),followed by determining if there are any other volumes in the clusterthat are designated for evaluation (124). As described above, a minimumof one copy of a volume in each cluster is thinned or compressed andassociated performance and space saving data is acquired to ascertainthe implications of compression or thinning for the cluster. In otherwords, the thinned or compressed copy is representative of the cluster.If at step (124) it is determined that there is volume(s) in the samecluster, cluster_(Y), designated for evaluation, then the processreturns to step (112). However, if at step (124) it is determined thatall of the volumes in the cluster subject to or designated forevaluation have been processed, then the cluster counting variable isincremented (126). As described above, the volumes are separated intoclusters, with a minimum of one cluster. Following step (126), it isdetermined if all of the clusters, and specifically, all of the volumesdesignated for evaluation in each of the clusters, have been processed(128). A negative response to the determination at step (128) isfollowed by a return to step (110) for processing of any designatedvolumes in the next cluster. However, a positive response to thedetermination at step (128) concludes the processing of volumes.

The process of evaluating storage volumes is repeated periodically forvarious reasons, including but not limited to, changes in workload,changes in the data center, etc. Similarly, in one embodiment, theprocess shown and described in FIG. 1 functions as a background processthat is repeated on a periodic basis so that the data in the knowledgebase is current. In one embodiment, the process shown and described inFIG. 1 may be activated by an administrator in the event that currentdata for the knowledge base is required or desired. Accordingly, theprocess shown and described in FIG. 1 provides performance impact dataand space saving data representative of the volumes.

The process shown and described in FIG. 1 may be referred to as abackground process. The evaluation is performed on copies of theselected volumes, and does not affect performance on the volume itself.In one embodiment, the data continues to be processed on thenon-compressed or non-thinned volumes, while the background processperforms the same application execution on the compressed or thinnedcopies as a sampling technique. The space saving and performance impactdata that takes place in the background is acquired so that in the eventa space saving technique is required, an educated decision may determinewhich volumes and/or clusters may be compressed or thinned with minimumimpact on performance.

Referring to FIG. 2, a flow chart (200) is provided illustrating aprocess for estimating space savings. At any given point-in-time, spacesavings from compression or thinning a volume copy can be estimatedusing temporal measurements stored in the knowledge base. Data stored inthe volumes of a data center is dynamic in that data continues to beread or written to the subject volumes as applications are processed. Asshown and described in FIG. 1, data in the knowledge base is acquiredfrom the background process (210). At the same time, there is livecapacity usage and access statistics associated with the volumes in thedata center (220). The live data pertains to changes associated with thestorage volumes since a prior population of the knowledge base from thebackground process. Such live data includes, but is not limited to, aquantity of read and write requests, a quantity of data delete requests,etc. In one embodiment, one or more counters are employed to track thelive usage data between executions of the background process. Data fromthe knowledge base at step (210) and from the live statistics at step(220) are received as input to a projection model (230). Morespecifically, the input data are employed by the projection model toascertain how much data in one or more volumes of the data center hasbeen subject to change since the prior estimate. In one embodiment, alinear regression model is employed to estimate growth projection atstep (230). In one embodiment, I/O access patterns, experienced by oneor more volumes between measurements found in the knowledge base andcurrent time, are used to predict a change in space savings since thelast measurement. Following the projection at step (230), a priorityscore is assigned to each volume in the data center subject toevaluation (240). In one embodiment, the volumes may be sorted based onthe associated priority scores, which may then be used to efficientlyidentify one or more volumes for compression or thinning. Accordingly,the estimation process shown herein employs static and dynamic storagedata to categorize one or more volumes for a potential space savingtechnique.

It is important to ensure that pools do not exceed their capacity. Inone embodiment, a storage threshold is set to a value lower than theactual capacity to ensure that capacity is not exceeded. For example, inone embodiment, a space savings technique, such as compression orthinning takes place when the associated storage pool is operating at80% capacity. Referring to FIG. 3, a flow chart (300) is providedillustrating a process for predicting a storage threshold violation. Asshown, input for the violation prediction comes in at least three formsof data, including but not limited to, expected new storage allocations(310), storage pool capacity usage threshold(s) (320), and capacityusage growth (330). In one embodiment, the expected allocations at step(310) are provided by an administrator or it is predicted based onallocation history. In one embodiment, the capacity at step (320) is afixed value based on the size of the associated storage volume(s),although in one embodiment, this value may be subject to change based ondata transfer and/or compression or thinning. In one embodiment, thecapacity usage growth (330) pertains to fluctuations in the range ofstorage pool usage. For example, if a volume has been added or removedfrom the pool. Data from (310), (320), and (330) are received as inputfor predicting a time for violation of the storage pool capacity (340).Output (350) from the prediction step (340) is generated in the form ofstorage pool and time until threshold violation. More specifically, theviolation prediction at step (340) provides output data (350) in theform of a time estimate at which the capacity will be exceeded. In oneembodiment, the time estimate may be on a per volume basis, a cluster ofvolumes basis, or a storage pool basis. Accordingly, the process shownand described herein is employed to predict a time threshold violationbased on a plurality of factors, including accommodating forfluctuations in usage.

One of the goals of creation, maintenance, and utilization of theknowledge base is to predict and ensure that storage volume thresholdsare not violated. Referring to FIG. 4, a flow chart (400) is providedillustrating managing storage capacity. Four elements are employed asinput data for minimizing performance degradation associated withstorage volume management, also referred to as optimization. Input forthe optimization includes, the time estimate for capacity violation(410), as shown and described in FIG. 3, the estimated space saving(s)(420), as shown and described in FIG. 2, acceptable pool thresholds(430), and administrator management policies (440). The pool thresholds(430) may be static values, or in one embodiment, a dynamic value. Inone embodiment, the policies at step (440) pertain to guidance oncompression or thinning, as each of these forms of space savings aredifferent and may have variances on impact. In one embodiment, the spacesavings at step (420) may be different depending on the techniqueemployed. Data from steps (410)-(440) are subject to optimization (450)for minimizing performance degradation. Output from the optimization(460) includes a prioritized list of all storage footprint reducingactions that can be taken for each storage pool. As shown herein, threestorage pools (462), (464), and (466) are shown sorted based onprioritization for footprint reduction. In one embodiment, each storagepool's volume has an action list that is sorted based on priority. Forexample, the sorting may be based on the product of volume space savingsand I/O latency increase. In one embodiment, the sorting is conducted inan order that executes one or more actions that generate savings fromthe greatest quantity of savings to the least quantity of savings. Inone embodiment, the sorting of the action list brings efficiency intothe volume selection process, wherein the list exhibits a prioritizationof the volumes. Accordingly, output from the optimization orders thestorage pools under investigation.

As shown, the optimization at step (460) filters actions based onfeasibility of completion, which in one embodiment uses a model for aprojected completion time for space reductions. Specifically, if thespace reduction cannot take place in the time required, there may be aspace violation. Projection of the completion time may affect thesorting of the list of storage pools (462), (464), and (466). In oneembodiment, the quantity of storage pools in the list may vary, and assuch, the quantity shown and described herein is merely an example andshould not be considered limiting. Following output from theoptimization at step (460), one or more volume storage reducing actionsare executed for each designated storage pool. More specifically, thevariable N_(Total) is assigned to the quantity of storage pools in theordered output list (470), and an associated storage pool countingvariable, N, is initialized (472). The storage reducing action(s) isexecuted on storage pool_(N) (474). It is then determined if anacceptable storage pool usage threshold is reached so that no furtheraction is required at this point-in-time (476). A negative response tothe determination at step (476) is followed by incremented the storagepool counting variable (478) and a return to step (474). However, apositive response to the determination at step (476) concludes thestorage pool reduction actions (480). In one embodiment, storagereduction actions for the pool with the smallest time to thresholdviolation are executed first, followed by the next smallest time tothreshold violation, etc. Accordingly, the footprint of availablestorage space is managed in a methodical manner to effectively andefficiently enable continued storage of data with minimal impact onstorage performance.

The processes shown and described in FIGS. 1-4 illustrate decouplingestimation and identification of storage footprint reduction and actualreclamation of storage space. This decoupling introduces a model basedapproach to address the dynamic characteristics of data storage. Morespecifically, capacity savings are attained in persistent storage mediumthrough capacity reducing optimizations, such as thinning andcompression.

Referring to FIG. 5, a block diagram (500) is provided illustratingcomponents of a storage pool capacity management system. As shown, aprocessing node (510) is shown in communication with a data center(550). The processing node (510) is provided with a processor (512),also referred to herein as a processing unit, operatively coupled tomemory (516) across a bus (514). The processing node (510) is furtherprovided in communication with other nodes (520), which are each incommunication with persistent storage maintained in the data center(550). Processing node (510) is responsible for the storage andmaintenance of data in the data center (550). More specifically, node(510) is provided with one or more tools to support storage poolcapacity management based on de-coupling capacity estimation fromcapacity saving execution. As shown herein, and described in detailbelow, the tools embody an adaptive system comprised of two modules,including a decouple module (530) and a selection module (540). Thedecouple module (530) functions to estimate capacity savings from one ormore space reduction actions. The selection module (540) functions todynamically select and execute a subset of the space reduction actionsbased on predicted threshold violations.

As shown, the data center (550) is configured with a plurality ofpersistent storage volumes (552), (554), (556), (558), and (560).Although only five volumes are shown and described, this quantity shouldnot be considered limiting. In one embodiment, the data center (550)includes a controller (570) to facilitate management of the storagevolumes. As shown, the storage controller (570) is shown with aprocessor (572) operatively coupled to memory (576) via a bus (574). Thecontroller (570) is in communication with the modules (530) and (540).More specifically, management control is communicated to the controllervia the modules to facilitate execution of any management actions on thestorage volumes.

The decouple module (530) functions to separate the estimation ofcapacity savings associated with one or more space reduction actionsfrom actual execution of these actions. In the separation process, thedecouple module (530) employs a similarity based sampling approach sothat volumes in the data center (550) may be placed into groups, alsoreferred to as clusters based on similarity, such as volumes that map tothe same application, exhibit similar random and sequential read andwrite I/O proportions, etc. For example, as shown herein, volumes (552)and (554) are placed in a first group, group_(A) (580), and volumes(556), (558), and (560) are placed in a second group, group_(B) (582).Although only two groups are shown, this quantity is an example andshould not be considered limiting. At the same time, the grouping ofvolumes is not static, and is subject to change. Data obtained fromanalysis of one or more volumes in a cluster may be extrapolated toother volumes in the cluster, but upon the similarity protocol.Accordingly, analysis may be limited to a subset of volumes in any givencluster.

As shown in FIG. 5, one or more volumes in a cluster are selected foranalysis associated with capacity management. The decouple module (530)is responsible for capacity management, and more specifically, forperforming a space reduction action on at least one selected volume in acluster and a study associated with effects on the storage system andstorage performance associated with the space reduction. Morespecifically, the decouple module measures capacity savings associatedwith the space reduction, switches an associated workload to a copy ofthe reduced volume, measures performance from the switched workload,records any performance degradation, and then removes the reduction sothat the system may revert back to a prior state. Data gathered andassociated from the decouple module (530) enables predictions associatedwith storage capacity management to be made and executed with carefulconsideration. The predictions may be converted to actions to ensureavailable capacity for data storage. Based upon the similarity betweenvarious clusters of volumes, the decouple module (530) may infercapacity savings and performance degradation for a cluster or volume inthe clusters based on measurements acquired from the volume in thecluster that was subject to thinning or compression.

Storage capacity threshold is an important factor that is managed sothat there is sufficient storage space to manage data processing. In oneembodiment, the threshold relates to a percentage of space remaining ina storage volume. Any time a volume is compressed or thinned there willbe a negative effect on performance. The goal is to compress or thin oneor more volumes when required. As such, there is a balancing act that isperformed between the decouple module (530) and the selection module(540), with the decouple module (530) running in the background and theselection module (540) running in the foreground. The selection module(540) selects one or more volumes from the storage pool for compressionor thinning based on a predicted capacity threshold violation. In oneembodiment, the decouple module (530) creates and maintains a list (590)of candidate volumes in each pool for space reduction. The list (590)corresponds to capacity savings and performance measurements. In oneembodiment, the list (590) is sorted and a priority is assigned toselect volumes in the list (590). The selection of storage volumes andexecution by the selection module (540) takes place on-demand. In oneembodiment, the selection module (540) conducts their selection based onthe sorted list (590). The list (590) is shown embedded in memory (516),although in one embodiment, the list (590) may be stored local to thedata center (550) of local to the controller (570).

The volumes in the storage pool may be a static quantity. Although inone embodiment, volumes may be added or removed from the storage pool.Communication with the volumes is ongoing. One or more processing nodescommunicate with the storage pool to support application processing thatrequires read and/or write operations to one or more storage volumes.The flow of communications between the processing node(s) and thestorage pool is referred to as I/O. In one embodiment, an I/O patternmay be visualized between the processing node(s) and associated storagevolumes. The decouple module (530) may utilize the I/O pattern topredict a change in space savings and usage since a prior measurement ofa volume in the cluster. More specifically, the decouple module (530)may update the measurements, thereby creating measured data associatedwith the volumes after the estimation, with the measured data associatedwith the update based on the I/O pattern. In one embodiment, the updateof the measurements includes an invalidation of any prior measurements.Accordingly, measurement data from sampling volumes and assessingstorage capacity may be updated based upon actual access patterns forthe volumes.

The system described in FIG. 5 has been labeled with tools in the formof modules (530) and (540). The tools may be implemented in programmablehardware devices such as field programmable gate arrays, programmablearray logic, programmable logic devices, or the like. The tools may alsobe implemented in software for execution by various types of processors.An identified functional unit of executable code may, for instance,comprise one or more physical or logical blocks of computer instructionswhich may, for instance, be organized as an object, procedure, function,or other construct. Nevertheless, the executable of the tools need notbe physically located together, but may comprise disparate instructionsstored in different locations which, when joined logically together,comprise the tools and achieve the stated purpose of the tool.

Indeed, executable code could be a single instruction, or manyinstructions, and may even be distributed over several different codesegments, among different applications, and across several memorydevices. Similarly, operational data may be identified and illustratedherein within the tool, and may be embodied in any suitable form andorganized within any suitable type of data structure. The operationaldata may be collected as a single data set, or may be distributed overdifferent locations including over different storage devices, and mayexist, at least partially, as electronic signals on a system or network.

Furthermore, the described features, structures, or characteristics maybe combined in any suitable manner in one or more embodiments. In thefollowing description, numerous specific details are provided, such asexamples of agents, to provide a thorough understanding of embodiments.One skilled in the relevant art will recognize, however, that theembodiment(s) can be practiced without one or more of the specificdetails, or with other methods, components, materials, etc. In otherinstances, well-known structures, materials, or operations are not shownor described in detail to avoid obscuring aspects of the embodiment(s).

The tools shown and described herein support management of storagevolume capacity in a pool of multiple storage volumes, and adaptivelyselecting one or more volumes for space reductions based on a predictedthreshold violation. As described above, the ramifications associatedwith space reduction is performed as a background operation so that thevolumes and associated clusters may be ranked and sorted, and theselection of a volume for space reduction is based on the ranking andsorting. In one embodiment, the ranking and sorting is on a per volumebasis, and in one embodiment it is expanded to include ranking andsorting the clusters in which the volumes are organized. Similarly, inone embodiment, the functionality and support of the capacity managementand selection of volumes for space reduction in support of themanagement may be extrapolated to a cloud computing environment with ashared pool of resources.

A cloud computing environment is service oriented with a focus onstatelessness, low coupling, modularity, and semantic interoperability.At the heart of cloud computing is an infrastructure comprising anetwork of interconnected nodes. Referring now to FIG. 6, a schematic ofan example of a cloud computing node is shown. Cloud computing node(610) is only one example of a suitable cloud computing node and is notintended to suggest any limitation as to the scope of use orfunctionality of embodiments described herein. Regardless, cloudcomputing node (610) is capable of being implemented and/or performingany of the functionality set forth hereinabove. In cloud computing node(610) there is a computer system/server (612), which is operational withnumerous other general purpose or special purpose computing systemenvironments or configurations. Examples of well-known computingsystems, environments, and/or configurations that may be suitable foruse with computer system/server (612) include, but are not limited to,personal computer systems, server computer systems, thin clients, thickclients, hand-held or laptop devices, multiprocessor systems,microprocessor-based systems, set top boxes, programmable consumerelectronics, network PCs, minicomputer systems, mainframe computersystems, and distributed cloud computing environments that include anyof the above systems or devices, and the like.

Computer system/server (612) may be described in the general context ofcomputer system-executable instructions, such as program modules, beingexecuted by a computer system. Generally, program modules may includeroutines, programs, objects, components, logic, data structures, and soon that perform particular tasks or implement particular abstract datatypes. Computer system/server (612) may be practiced in distributedcloud computing environments where tasks are performed by remoteprocessing devices that are linked through a communications network. Ina distributed cloud computing environment, program modules may belocated in both local and remote computer system storage media includingmemory storage devices.

As shown in FIG. 6, computer system/server (612) in cloud computing node(610) is shown in the form of a general-purpose computing device. Thecomponents of computer system/server (612) may include, but are notlimited to, one or more processors or processing units (616), systemmemory (628), and a bus (618) that couples various system componentsincluding system memory (628) to processor (616). Bus (618) representsone or more of any of several types of bus structures, including amemory bus or memory controller, a peripheral bus, an acceleratedgraphics port, and a processor or local bus using any of a variety ofbus architectures. By way of example, and not limitation, sucharchitectures include an Industry Standard Architecture (ISA) bus, aMicro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, VideoElectronics Standards Association (VESA) local bus, and a PeripheralComponent Interconnects (PCI) bus. A computer system/server (612)typically includes a variety of computer system readable media. Suchmedia may be any available media that is accessible by a computersystem/server (612), and it includes both volatile and non-volatilemedia, and removable and non-removable media.

System memory (628) can include computer system readable media in theform of volatile memory, such as random access memory (RAM) (630) and/orcache memory (632). Computer system/server (612) may further includeother removable/non-removable, volatile/non-volatile computer systemstorage media. By way of example only, storage system (634) can beprovided for reading from and writing to a non-removable, non-volatilemagnetic media (not shown and typically called a “hard drive”). Althoughnot shown, a magnetic disk drive for reading from and writing to aremovable, non-volatile magnetic disk (e.g., a “floppy disk”), and anoptical disk drive for reading from or writing to a removable,non-volatile optical disk such as a CD-ROM, DVD-ROM or other opticalmedia can be provided. In such instances, each can be connected to bus(618) by one or more data media interfaces. As will be further depictedand described below, memory (628) may include at least one programproduct having a set (e.g., at least one) of program modules that areconfigured to carry out the functions of the embodiment(s).

Program/utility (640), having a set (at least one) of program modules(642), may be stored in memory (628) by way of example, and notlimitation, as well as an operating system, one or more applicationprograms, other program modules, and program data. Each of the operatingsystems, one or more application programs, other program modules, andprogram data or some combination thereof, may include an implementationof a networking environment. Program modules (642) generally carry outthe functions and/or methodologies of embodiments as described herein.

Computer system/server (612) may also communicate with one or moreexternal devices (614), such as a keyboard, a pointing device, a display(624), etc.; one or more devices that enable a user to interact withcomputer system/server (612); and/or any devices (e.g., network card,modem, etc.) that enables computer system/server (612) to communicatewith one or more other computing devices. Such communication can occurvia Input/Output (I/O) interfaces (622). Still yet, computersystem/server (612) can communicate with one or more networks such as alocal area network (LAN), a general wide area network (WAN), and/or apublic network (e.g., the Internet) via network adapter (620). Asdepicted, network adapter (620) communicates with the other componentsof computer system/server (612) via bus (618). It should be understoodthat although not shown, other hardware and/or software components couldbe used in conjunction with computer system/server (612). Examples,include, but are not limited to: microcode, device drivers, redundantprocessing units, external disk drive arrays, RAID systems, tape drives,and data archival storage systems, etc.

Referring now to FIG. 7, illustrative cloud computing environment (750)is depicted. As shown, cloud computing environment (750) comprises oneor more cloud computing nodes (710) with which local computing devicesused by cloud consumers, such as, for example, personal digitalassistant (PDA) or cellular telephone (754A), desktop computer (754B),laptop computer (754C), and/or automobile computer system (754N) maycommunicate. Nodes (710) may communicate with one another. They may begrouped (not shown) physically or virtually, in one or more networks,such as Private, Community, Public, or Hybrid clouds as describedhereinabove, or a combination thereof. This allows cloud computingenvironment (750) to offer infrastructure, platforms, and/or software asservices for which a cloud consumer does not need to maintain resourceson a local computing device. It is understood that the types ofcomputing devices (754A)-(754N) shown in FIG. 7 are intended to beillustrative only and that computing nodes (710) and cloud computingenvironment (750) can communicate with any type of computerized deviceover any type of network and/or network addressable connection (e.g.,using a web browser).

Referring now to FIG. 8, a set of functional abstraction layers providedby cloud computing environment (800) is shown. It should be understoodin advance that the components, layers, and functions shown in FIG. 8are intended to be illustrative only and embodiments are not limitedthereto. As depicted, the following layers and corresponding functionsare provided: hardware and software layer (810), virtualization layer(820), management layer (830), and workload layer (840). The hardwareand software layer (810) includes hardware and software components.Examples of hardware components include mainframes, in one example IBM®zSeries® systems; RISC (Reduced Instruction Set Computer) architecturebased servers, in one example IBM pSeries® systems; IBM xSeries®systems; IBM BladeCenter® systems; storage devices; networks andnetworking components. Examples of software components include networkapplication server software, in one example IBM WebSphere® applicationserver software; and database software, in one example IBM DB2® databasesoftware. (IBM, zSeries, pSeries, xSeries, BladeCenter, WebSphere, andDB2 are trademarks of International Business Machines Corporationregistered in many jurisdictions worldwide).

Virtualization layer (820) provides an abstraction layer from which thefollowing examples of virtual entities may be provided: virtual servers;virtual storage; virtual networks, including virtual private networks;virtual applications and operating systems; and virtual clients. In oneexample, a management layer (830) may provide the following functions:resource provisioning, metering and pricing, user portal, service levelmanagement, and key management. The functions are described below.Resource provisioning provides dynamic procurement of computingresources and other resources that are utilized to perform tasks withinthe cloud computing environment. Metering and pricing provides costtracking as resources that are utilized within the cloud computingenvironment, and billing or invoicing for consumption of theseresources. In one example, these resources may comprise applicationsoftware licenses. Security provides identity verification for cloudconsumers and tasks, as well as protection for data and other resources.User portal provides access to the cloud computing environment forconsumers and system administrators.

Workloads layer (840) provides examples of functionality for which thecloud computing environment may be utilized. In the shared pool ofconfigurable computer resources described herein, hereinafter referredto as a cloud computing environment, files may be shared among userswithin multiple data centers, also referred to herein as data sites.Accordingly, a series of mechanisms are provided within the shared poolto support organization and management of data storage within the cloudcomputing environment.

The processes shown and described herein address components thatfunction to manage storage pool capacity. Specifically, there is thebackground process to obtain accurate estimates by actually compressingor thinning at least one volume per pool and acquiring data associatedwith transactions on the compressed or thinning volume(s). The acquireddata is then employed to estimate behavior of other volumes in the samepool. The foreground process employs the background data to addresscapacity saving execution. The background and foreground processesdecouple estimation from actual reclamation.

The present invention as shown and described in details in the figuresmay be a system, a method, and/or a computer program product. Thecomputer program product may include a computer readable storage medium(or media) having computer readable program instructions thereon forcausing a processor to carry out aspects of the present invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowcharts and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe functions/acts specified in the flowcharts and/or block diagramblock or blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowcharts and/or block diagram block orblocks.

The flowcharts and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowcharts or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustrations, and combinations ofblocks in the block diagrams and/or flowchart illustrations, can beimplemented by special purpose hardware-based systems that perform thespecified functions or acts or carry out combinations of special purposehardware and computer instructions.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an,” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present invention has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the invention. Theembodiment was chosen and described in order to best explain theprinciples of the invention and the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated. Accordingly, the implementation of thebackground process enables the storage volumes to continue supportingapplications, while capacity management functions to ensure availabilityof sufficient storage space.

It will be appreciated that, although specific embodiments of theinvention have been described herein for purposes of illustration,various modifications may be made without departing from the spirit andscope of the invention. Accordingly, the scope of protection of thisinvention is limited only by the following claims and their equivalents.

What is claimed is:
 1. A computer implemented method for managingcapacity of a storage pool comprising: processing non-compressed dataand maintaining live usage statistics associated with volumes in thestorage pool operating at a first state; maintaining a list of candidatevolumes in each storage pool for space reduction, the list associatedwith corresponding capacity savings and performance measurements;prioritizing volumes for each storage pool in the list as a function ofstorage growth projection; selecting a volume from the storage pool, andperforming a first space reduction action on a copy of the selectedvolume, including: measuring a capacity change associated with theselected volume; measuring performance change from a workload on theselected volume; and producing a subset of optimal space reductionactions for execution, the actions based on the recorded capacity andperformance change data; and on-demand, executing at least one actionfrom the produced subset of optimal space reduction actions on acorresponding volume in the storage pool in the first state, theexecution selectively converting one or more non-compressed volumes inthe storage pool to a second state, wherein the second state is selectedfrom the group consisting of: compressed and thinned.
 2. The method ofclaim 1, further comprising periodically evaluating storage volume andstorage performance measurements as a background process.
 3. The methodof claim 1, further comprising predicting a storage threshold violation,including storage pool and time until threshold violation.
 4. The methodof claim 3, wherein the storage threshold violation prediction includesa time estimate of storage pool capacity, and wherein the estimate is ona basis selected from the group consisting of volume, cluster ofvolumes, and storage pool.
 5. The method of claim 1, further comprisinginferring capacity saving and performance degradation for a non-selectedvolume in the pool, wherein the inference is based on the measurementsfrom the selected volume.
 6. The method of claim 1, further comprisingpredicting a change in space saving since a prior measurement, theprediction employing an I/O access pattern observed for each volume. 7.A computer program product for managing capacity of a storage pool, thecomputer program product comprising a computer readable storage devicehaving program code embodied therewith, the program code executable by aprocessing unit to: process non-compressed data and maintaining liveusage statistics associated with volumes in the storage pool operatingat a first state; maintain a list of candidate volumes in each storagepool for space reduction, the list associated with correspondingcapacity savings and performance measurements; prioritize volumes foreach storage pool in the list as a function of storage growthprojection; select a volume from the storage pool and perform a firstspace reduction action on a copy of the selected volume, including:measure a capacity change associated with the selected volume, andrecord capacity change data in a first location; measure performancechange from a switched workload on the selected volume, and recordperformance change data in a second location; and produce a subset ofoptimal space reduction actions for execution, the actions based on therecorded capacity and performance change data; and on-demand, execute atleast one action from the produced subset of optimal space reductionactions on a corresponding volume in the storage pool in the firststate, the execution to selectively convert one or more non-compressedvolumes in the storage pool to a second state, wherein the second stateis selected from the group consisting of: compressed and thinned.
 8. Thecomputer program product of claim 7, further comprising program code toperiodically evaluate storage volume and storage performancemeasurements as a background process.
 9. The computer program product ofclaim 7, further comprising program code to predict a storage thresholdviolation, including storage pool and time until threshold violation.10. The computer program product of claim 9, wherein the storagethreshold violation prediction includes a time estimate of storage poolcapacity, and wherein the estimate is on a basis selected from the groupconsisting of volume, cluster of volumes, and storage pool.
 11. Thecomputer program code of claim 7, further comprising program code toinfer capacity saving and performance degradation for a non-selectedvolume in the pool, wherein the inference is based on the measurementsfrom the selected volume.
 12. The computer program code of claim 7,further comprising program code to: predict a change in space savingsince a prior measurement, the prediction employing an I/O accesspattern observed for each volume; and periodically update themeasurements, including invalidating any prior measurement data.
 13. Acomputer system comprising: a processing unit operatively coupled tomemory; a storage pool, having two or more storage volumes, operativelycoupled to the processing unit, wherein non-compressed data is processedand live usage statistics associated with the volumes in the storagepool operating at a first state is maintained; a tool in communicationwith the processing unit to manage capacity of the storage pool,including the tool to: maintain a list of candidate volumes in eachstorage pool for space reduction, the list associated with correspondingcapacity savings and performance measurements; prioritize volumes foreach storage pool in the list as a function of storage growthprojection; select a volume from the storage pool, and perform a firstspace reduction action on a copy of the selected volume, including:measure a capacity change associated with the selected volume, andrecord capacity change data in a first location; measure performancechange from a workload on the selected volume, and record performancechange data in a second location; and produce a subset of optimal spacereduction actions for execution, the actions based on the recordedcapacity and performance change data; and a selection module toon-demand, execute at least one action on the produced subset of optimalspace reduction actions on a corresponding volume in the storage pool inthe first state, the execution selectively converting one or morenon-compressed volumes in the storage pool to a second state, whereinthe second state is selected from the group consisting of: compressedand thinned.
 14. The system of claim 13, further comprising the tool toperiodically evaluate storage volume and storage performancemeasurements as a background process.
 15. The system of claim 13,further comprising the tool to predict a storage threshold violation,including storage pool and time until threshold violation.
 16. Thesystem of claim 15, wherein the storage threshold violation predictionincludes a time estimate of storage pool capacity, and wherein theestimate is on a basis selected from the group consisting of volume,cluster of volumes, and storage pool.
 17. The system of claim 13,further comprising the tool to infer capacity saving and performancedegradation for a non-selected volume in the pool, wherein the inferenceis based on the measurements from the selected volume.
 18. The system ofclaim 13, further comprising the tool to: predict a change in spacesaving since a prior measurement, the prediction employing an I/O accesspattern observed for each volume; and periodically update themeasurements, including invalidating any prior measurement data.